Building a Generic Graph-based Descriptor Set for use in Drug Discovery

نویسندگان

  • Phillip Lock
  • Nicolas Le Mercier
  • Jiuyong Li
  • Markus Stumptner
چکیده

The ability to predict drug activity from molecular structure is an important field of research both in academia and in the pharmaceutical industry. Raw 3D structure data is not in a form suitable for identifying properties using machine learning so it must be reconfigured into descriptor sets that continue to encapsulate important structural properties of the molecule. In this study, a large number of small molecule structures, obtained from publicly available databases, was used to generate a set of molecular descriptors that can be used with machine learning to predict drug activity. The descriptors were for the most part simple graph strings representing chains of connected atoms. Atom counts averaging seventy, using a dataset of just over one million molecules, resulted in a very large set of simple graph strings of lengths two to twelve atoms. Elimination of duplicates, reverse strings and feature reduction techniques were applied to reduce the path count to about three thousand which was viable for machine learning. Training data from twenty six data sets was used to build a decision tree classifier using J48 and Random Forest. Forty three thousand molecules from the NCI HIV dataset were used with the descriptor set to generate decision tree models with good accuracy. A similar algorithm was used to extract ring structures in the molecules. Inclusion of thirteen ring structure descriptors increased the accuracy of prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Molecular Descriptor Derived from Weighted Line Graph

The Bertz indices, derived by counting the number of connecting edges of line graphs of a molecule were used in deriving the QSPR models for the physicochemical properties of alkanes. The inability of these indices to identify the hetero centre in a chemical compound restricted their applications to hydrocarbons only. In the present work, a novel molecular descriptor has been derived from the w...

متن کامل

fmcsR: a Flexible Maximum Common Substructure Algorithm for Advanced Compound Similarity Searching

Maximum common substructure (MCS) algorithms rank among the most sensitive and accurate methods for measuring structural similarities among small molecules. This utility is critical for many research areas in drug discovery and chemical genomics. The MCS problem is a graph-based similarity concept that is defined as the largest substructure (sub-graph) shared among two compounds (Cao et al., 20...

متن کامل

A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening

BACKGROUND Measures of similarity for chemical molecules have been developed since the dawn of chemoinformatics. Molecular similarity has been measured by a variety of methods including molecular descriptor based similarity, common molecular fragments, graph matching and 3D methods such as shape matching. Similarity measures are widespread in practice and have proven to be useful in drug discov...

متن کامل

Use of Structure Codes (Counts) for Computing Topological Indices of Carbon Nanotubes: Sadhana (Sd) Index of Phenylenes and its Hexagonal Squeezes

Structural codes vis-a-vis structural counts, like polynomials of a molecular graph, are important in computing graph-theoretical descriptors which are commonly known as topological indices. These indices are most important for characterizing carbon nanotubes (CNTs). In this paper we have computed Sadhana index (Sd) for phenylenes and their hexagonal squeezes using structural codes (counts). Sa...

متن کامل

Discovery of Novel Glucagon Receptor Antagonists Using Combined Pharmacophore Modeling and Docking

Glucagon and the glucagon receptor are most important molecules control over blood glucose concentrations. These two molecules are very important to studies of type 2 diabetic patients. In literature, several classes of small molecule antagonists of the human glucagon receptor have been reported. Glucagon receptor antagonist could decrease hepatic glucose output and improve glucose control in d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009